MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis.
نویسندگان
چکیده
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.
منابع مشابه
Analysis of metagenomics data
Improved sampling of diverse environments and advances in the development and application of next-generation sequencing technologies is accelerating the rate at which new metagenomes are produced. Over the past few years, the major challenge associated with metagenomics has shifted from generating to analyzing sequences. Metagenomic analysis includes the identification, and functional and evolu...
متن کاملAnalyzing Metagenomic Data: Inferring Microbial Community Function with Mg-rast
Application of massively parallel throughput DNA sequencing technologies to the generation of metagenomic datasets from environmental samples is presently transforming the field of microbiology. Whereas traditional (Sanger-based) DNA sequencing technology imparted a high economic cost on data generation, the development of “next-generation” technologies now make the large-scale generation of se...
متن کاملThe smallest cells pose the biggest problems: high-performance computing and the analysis of metagenome sequence data
New high-throughput DNA sequencing technologies have revolutionized how scientists study the organisms around us. In particular, microbiology – the study of the smallest, unseen organisms that pervade our lives – has embraced these new techniques to characterize and analyze the cellular constituents and use this information to develop novel tools, techniques, and therapeutics. So-called next-ge...
متن کاملPlanning and Budgeting for Nutrition Programs in Tanzania: Lessons Learned From the National Vitamin A Supplementation Program
Background Micronutrient deficiency in Tanzania is a significant public health problem, with vitamin A deficiency (VAD) affecting 34% of children aged 6 to 59 months. Since 2007, development partners have worked closely to advocate for the inclusion of twice-yearly vitamin A supplementation and deworming (VASD) activities with budgets at the subnational level, where funding and implementation o...
متن کاملShotgun metagenomic sequencing based microbial diversity assessment of Lasundra hot spring, India
This is the first report on the metagenomic approach for unveiling the microbial diversity of Lasundra hot spring, Gujarat State, India. High-throughput sequencing of community DNA was performed on an Ion Torrent PGM platform. Metagenome consisted of 606,867 sequences represent 98,567,305 bps size with an average length of 162 bps and 46% G + C content. Metagenome sequence information is availa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Briefings in bioinformatics
دوره شماره
صفحات -
تاریخ انتشار 2017